Skip to content

feat(schema): lifecycle entity rows for b2b_saas_ltv_v1 [LTV-Pb]#104

Merged
shaypal5 merged 2 commits into
mainfrom
feat/ltv-lifecycle-entity-rows
Jun 10, 2026
Merged

feat(schema): lifecycle entity rows for b2b_saas_ltv_v1 [LTV-Pb]#104
shaypal5 merged 2 commits into
mainfrom
feat/ltv-lifecycle-entity-rows

Conversation

@shaypal5

Copy link
Copy Markdown
Contributor

Summary

First implementation PR of the LTV workstream (LTV-Pb, milestone
LTV-M1). Adds the schema foundation for the post-conversion lifecycle
bundle (b2b_saas_ltv_v1), fully decoupled from the lead-scoring catalog.

New entity rows (leadforge/schema/entities.py)

  • SubscriptionEventRow (subscription_events) — lifecycle state changes
    (renewal / expansion / downgrade / churn / payment_failure / payment_recovered).
  • HealthSignalRow (health_signals) — weekly product-usage telemetry.
  • InvoiceRow (invoices) — monthly billing; the unit of pLTV value.
  • CustomerLifecycleRow / SubscriptionLifecycleRow — richer
    customers/subscriptions for the lifecycle bundle.

Key design decision — dedicated classes, not in-place extension

The roadmap originally said "extend CustomerRow/SubscriptionRow." During
implementation I found that EntityRow.to_dict() emits every dataclass
field, so adding fields in place would silently change the lead-scoring
instructor bundle's customers/subscriptions parquet schema (and break
its contract tests). Instead, the lifecycle bundle uses dedicated
CustomerLifecycleRow / SubscriptionLifecycleRow classes (reusing the
logical table names) kept in a separate registry. This faithfully realizes the
"lead-scoring output unchanged" requirement. docs/ltv/design.md §4.2/§10
updated to record this.

Separate registries (lead-scoring catalog untouched)

  • LIFECYCLE_ROW_TYPES / LIFECYCLE_TABLE_NAMES (6 tables: accounts reused +
    customers, subscriptions, subscription_events, health_signals, invoices).
  • LIFECYCLE_CONSTRAINTS — 6 FK edges; customers → accounts only (no
    customer → opportunity FK under independent generation; opportunity_id
    is a nullable column reserved for future chaining).
  • ID_PREFIXES gains subscription_event/health_signal/invoice
    (subev/hsig/inv).

ALL_ROW_TYPES (9), TABLE_NAMES, and ALL_CONSTRAINTS (10) are unchanged —
a guard test asserts it.

Tests

48 new in tests/schema/test_lifecycle_entities.py: to_dict parity,
empty-dataframe columns/dtypes, populated + empty parquet round-trips, registry
shape, lifecycle FK constraints, lead-scoring-catalog-unchanged guard, and ID
prefixes. tests/schema/test_ids.py expected-set updated.

  • Full suite: 1480 passed / 51 skipped (+48).
  • ruff check + ruff format --check: clean.
  • mypy leadforge/: clean (84 files).
  • No bundle regeneration; BUNDLE_SCHEMA_VERSION unchanged (the schema bump to
    6 lands with the recipe wiring in LTV-M5).

Scope

Schema contracts only — no simulation, render, or recipe wiring. Next:
LTV-Pc (pLTV feature spec + regression task specs).

🤖 Generated with Claude Code

First implementation PR of the LTV workstream. Adds the schema foundation for
the post-conversion lifecycle bundle, fully decoupled from the lead-scoring
catalog so its output is unchanged.

New entity rows (leadforge/schema/entities.py):
- SubscriptionEventRow (subscription_events) — lifecycle state changes.
- HealthSignalRow (health_signals) — weekly product-usage telemetry.
- InvoiceRow (invoices) — monthly billing; the unit of pLTV value.
- CustomerLifecycleRow / SubscriptionLifecycleRow — richer customers/
  subscriptions for the lifecycle bundle. Dedicated classes rather than
  in-place extension of CustomerRow/SubscriptionRow, because to_dict() emits
  every field and extending in place would change the lead-scoring instructor
  bundle's parquet schema. opportunity_id is nullable (independent generation;
  reserved for future chaining).

Registries kept separate from the lead-scoring catalog:
- LIFECYCLE_ROW_TYPES / LIFECYCLE_TABLE_NAMES (entities.py).
- LIFECYCLE_CONSTRAINTS (relationships.py) — 6 FK edges; customers→accounts
  only (no customer→opportunity FK under independent generation).
- ID_PREFIXES gains subscription_event/health_signal/invoice (subev/hsig/inv).

ALL_ROW_TYPES (9), TABLE_NAMES, and ALL_CONSTRAINTS (10) are unchanged; a guard
test asserts this. docs/ltv/design.md §4.2/§10 updated to record the
dedicated-classes decision.

Tests: 48 new in tests/schema/test_lifecycle_entities.py (to_dict parity,
empty-dataframe columns/dtypes, parquet round-trips, registry shape, lifecycle
FK constraints, lead-scoring-catalog-unchanged guard, ID prefixes) + ID prefix
set updated. Full suite 1480 passed / 51 skipped; ruff + mypy clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 10, 2026 07:37
@shaypal5 shaypal5 added this to the dataset: leadforge-ltv-v1 milestone Jun 10, 2026
@shaypal5 shaypal5 added type: feature New capability layer: schema schema/ entity/event contracts status: needs review Ready for review dataset: leadforge-ltv-v1 Issue/PR scoped to the b2b_saas_ltv_v1 LTV dataset workstream labels Jun 10, 2026
Check off LTV-Pb in the roadmap and link its GitHub PR (#104); update the
.agent-plan.md status line.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions

This comment has been minimized.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces the schema foundations for the new post-conversion lifecycle bundle (b2b_saas_ltv_v1) while explicitly keeping the existing lead-scoring schema/catalog unchanged by using dedicated lifecycle row classes and separate registries.

Changes:

  • Added lifecycle-specific entity row dataclasses (CustomerLifecycleRow, SubscriptionLifecycleRow, SubscriptionEventRow, HealthSignalRow, InvoiceRow) plus lifecycle registries (LIFECYCLE_ROW_TYPES, LIFECYCLE_TABLE_NAMES) without modifying ALL_ROW_TYPES/TABLE_NAMES.
  • Added lifecycle FK constraint registry (LIFECYCLE_CONSTRAINTS) separate from the lead-scoring FK graph.
  • Extended ID prefix registry to cover lifecycle-only entity types and added comprehensive schema/round-trip/guard tests.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

Show a summary per file
File Description
leadforge/schema/entities.py Adds lifecycle row contracts and lifecycle-only registries while preserving the lead-scoring registries unchanged.
leadforge/schema/relationships.py Introduces lifecycle-only FK constraints, kept separate from the lead-scoring FK constraint set.
leadforge/core/ids.py Adds lifecycle ID prefixes (subev, hsig, inv) and documents lifecycle prefix usage.
tests/schema/test_lifecycle_entities.py Adds lifecycle schema contract tests (to_dict parity, empty df schema, parquet round-trips, registry shape, FK constraints, and lead-scoring invariants).
tests/schema/test_ids.py Updates expected ID-prefix coverage to include lifecycle entity types.
docs/ltv/design.md Updates design documentation to reflect the “dedicated lifecycle classes + separate registry” decision and lifecycle table inventory notes.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@github-actions

Copy link
Copy Markdown

pr-agent-context report:

No unresolved review comments, failing checks, or actionable patch coverage gaps were found on PR #104 in repository https://github.com/leadforge-dev/leadforge. Treat this PR as all clear unless new signals appear.

Run metadata:

Tool ref: v4
Tool version: 4.0.21
Trigger: commit pushed
Workflow run: 27260937206 attempt 1
Comment timestamp: 2026-06-10T07:39:02.837812+00:00
PR head commit: f53d437c74fcdde96c7561dc1058ec66e7d393f5

@shaypal5 shaypal5 merged commit 12fd876 into main Jun 10, 2026
9 of 10 checks passed
@shaypal5 shaypal5 deleted the feat/ltv-lifecycle-entity-rows branch June 10, 2026 07:55
shaypal5 added a commit that referenced this pull request Jun 10, 2026
Acts on the maintainer decision that leadforge becomes a platform hosting two
PARALLEL, peer generation schemes (lead_scoring + lifecycle), not a
lead-scoring framework with an LTV bolt-on.

design.md:
- New §2.5 "peer generation schemes": decisions D10 (extract the
  GenerationScheme abstraction EARLY, against the known-good lead-scoring path,
  output byte-identical) and D11 (physically reorganize into
  leadforge/schemes/{lead_scoring,lifecycle}/ now). Adds the scheme→recipe→
  bundle hierarchy, the GenerationScheme protocol shape, a shared-envelope vs
  per-scheme table, the target package layout, reorg safety rails for the
  published 1.x package, and a note that LTV-Pb (#104) already aligns.
- §10 inventory: lifecycle modules now live under schemes/lifecycle/; adds
  schemes/base.py; recipe declares scheme: lifecycle.

roadmap.md (reshaped to 9 milestones / ~18 PRs, Pa..Pr):
- New LTV-M2 "Generation-scheme architecture + physical reorg" (LTV-Pd/Pe/Pf):
  protocol+registry against lead-scoring → move lead-scoring into
  schemes/lead_scoring/ → scaffold schemes/lifecycle/ and relocate the
  LTV-Pb/Pc specs.
- Lifecycle build milestones (population/engine/snapshots) renumbered to
  M3-M5 and now land directly under schemes/lifecycle/.
- LTV-M6 registers LifecycleScheme end-to-end + recipe + manifest
  generation_scheme + schema v6.

.agent-plan.md: scheme-architecture summary + revised status (M2 next; can run
in parallel with M1 since it only touches the existing lead-scoring path).

Stacked on the LTV-Pb branch (#104) because it references that work as done.
No package code in this PR.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
shaypal5 added a commit that referenced this pull request Jun 10, 2026
Acts on the maintainer decision that leadforge becomes a platform hosting two
PARALLEL, peer generation schemes (lead_scoring + lifecycle), not a
lead-scoring framework with an LTV bolt-on.

design.md:
- New §2.5 "peer generation schemes": decisions D10 (extract the
  GenerationScheme abstraction EARLY, against the known-good lead-scoring path,
  output byte-identical) and D11 (physically reorganize into
  leadforge/schemes/{lead_scoring,lifecycle}/ now). Adds the scheme→recipe→
  bundle hierarchy, the GenerationScheme protocol shape, a shared-envelope vs
  per-scheme table, the target package layout, reorg safety rails for the
  published 1.x package, and a note that LTV-Pb (#104) already aligns.
- §10 inventory: lifecycle modules now live under schemes/lifecycle/; adds
  schemes/base.py; recipe declares scheme: lifecycle.

roadmap.md (reshaped to 9 milestones / ~18 PRs, Pa..Pr):
- New LTV-M2 "Generation-scheme architecture + physical reorg" (LTV-Pd/Pe/Pf):
  protocol+registry against lead-scoring → move lead-scoring into
  schemes/lead_scoring/ → scaffold schemes/lifecycle/ and relocate the
  LTV-Pb/Pc specs.
- Lifecycle build milestones (population/engine/snapshots) renumbered to
  M3-M5 and now land directly under schemes/lifecycle/.
- LTV-M6 registers LifecycleScheme end-to-end + recipe + manifest
  generation_scheme + schema v6.

.agent-plan.md: scheme-architecture summary + revised status (M2 next; can run
in parallel with M1 since it only touches the existing lead-scoring path).

Stacked on the LTV-Pb branch (#104) because it references that work as done.
No package code in this PR.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
shaypal5 added a commit that referenced this pull request Jun 10, 2026
…leScheme [LTV-Pg.1] (#111)

* refactor(schema): scaffold schemes/lifecycle/ + register stub LifecycleScheme [LTV-Pg.1]

First half of the schema reorg (LTV-Pg). Gives the lifecycle scheme its own
home and makes it a registered peer of lead_scoring, ahead of building its
pipeline (M3–M6). Byte-identical; lead-scoring catalog unchanged.

- New leadforge/schemes/lifecycle/ package:
  - entities.py — the 5 lifecycle rows (CustomerLifecycleRow,
    SubscriptionLifecycleRow, SubscriptionEventRow, HealthSignalRow, InvoiceRow)
    + LIFECYCLE_ROW_TYPES / LIFECYCLE_TABLE_NAMES, moved from schema/entities.py.
    AccountRow / EntityRowProtocol / _empty_df are shared and imported from
    leadforge.schema.entities.
  - relationships.py — LIFECYCLE_CONSTRAINTS, moved from schema/relationships.py
    (reuses the shared FKConstraint).
  - __init__.py — stub LifecycleScheme (build_world/write_bundle raise
    NotImplementedError until M3–M6); self-registers. schemes/__init__ imports it.
- schema/entities.py and schema/relationships.py: lifecycle definitions removed;
  breadcrumb comments point to the new home. ALL_ROW_TYPES / ALL_CONSTRAINTS
  unchanged.
- tests/schema/test_lifecycle_entities.py → tests/schemes/lifecycle/test_entities.py
  with updated imports; tests/schemes/test_registry.py gains lifecycle
  registration + stub-raises-NotImplementedError tests.
- CHANGELOG, CLAUDE.md (both layouts), roadmap (Pg split into Pg.1/Pg.2),
  agent-plan updated.

available_schemes() → ("lead_scoring", "lifecycle"). Verified byte-identical
(14/14 files); full suite 1534 passed / 51 skipped; ruff + mypy clean (92 files).

Note: class-level extraction from the shared schema/entities.py can't be a git
rename (multiple classes pulled from a multi-class file); the lifecycle rows
were only added in #104 so the history loss is shallow.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* docs(ltv): record LTV-Pg.1 (#111) in roadmap + agent-plan [LTV-Pg.1]

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* refactor(schema): make_empty_dataframe public (self-review) [LTV-Pg.1]

Self-review: schemes/lifecycle/entities.py imported a private symbol
(`_empty_df`) across packages — a leading-underscore name signals
module-internal, so importing it elsewhere is a smell. Promote it to a public
shared helper `make_empty_dataframe` in leadforge.schema.entities (used by both
the lead-scoring rows and the lifecycle rows); the cross-module import is now
legitimate.

No behaviour change (verified byte-identical, 14/14); full suite passes; ruff +
mypy clean. (When LTV-Pg.2 moves lead-scoring rows out of schema/entities.py,
make_empty_dataframe + EntityRowProtocol stay as the shared primitives.)

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dataset: leadforge-ltv-v1 Issue/PR scoped to the b2b_saas_ltv_v1 LTV dataset workstream layer: schema schema/ entity/event contracts status: needs review Ready for review type: feature New capability

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants